Supporting Knowledge Discovery in Data Stream Management Systems
نویسنده
چکیده
Data mining represents an exciting and vibrant area of research. In particular, on-line mining has gained significant momentum in recent years. The changing data characteristics and real-time response constraints of streaming data preclude the use of existing mining algorithms that were designed for stored datasets. Therefore, researchers are proposing new fast and light algorithms for on-line mining tasks, such as classification, clustering, frequent itemsets, pattern matching, and many others. Beyond the interesting research problems posed by the design of individual algorithms for different mining methods, there remains the issue that these must be integrated into an Inductive Data Stream Mining System that supports (i) libraries of interoperable mining methods, (ii) all essential functions of data stream management systems, such as continuous query optimization, load shedding, synoptic constructs, and non-stop computing, and (iii) ease-of-use and extensibility. The issue of Inductive Data Stream Mining System has received little attention in the past, and thus offers a pristine opportunity for research contributions in my thesis work. Furthermore, there will be an opportunity for advancing the state of the art in mining algorithms, particularly in terms of extensibility, interoperability, and genericity of data representation. In this prospectus, I briefly discuss data stream management systems, data mining methods and systems, and other areas closely related to the topic of this thesis. Then, I present recently obtained preliminary results, including data representations for achieving generic implementations for many on-line mining algorithms. Furthermore, I discuss advanced techniques such as ensemble-based bagging and boosting, for which generic implementations were also devised. Finally, preliminary experiments are presented to verify the efficiency of the proposed approach. Future work will focus on more experiments and fine-tuning of mining algorithms. Furthermore, support for high-level mining languages also represents a promising topic for future research.
منابع مشابه
A data mining approach to employee turnover prediction (case study: Arak automotive parts manufacturing)
Training and adaption of employees are time and money consuming. Employees’ turnover can be predicted by their organizational and personal historical data in order to reduce probable loss of organizations. Prediction methods are highly related to human resource management to obtain patterns by historical data. This article implements knowledge discovery steps on real data of a manufacturing pla...
متن کاملSupporting Environmental Information Systems and Services Realization with the Geo-Spatial and Streaming Dimensions of the Semantic Web
Environmental Information Systems and Services require flexible discovery and chaining of distributed environmental services to support a large number of concurrent decision processes. The ability to cope with geo-spatial features of the environment and to process in real time huge and possibly noisy data streams are two critical factors in supporting such decision processes. Solution to separa...
متن کاملThe Relationship between Knowledge Management and the Process of Entrepreneurship in Sport Organizations
In the current competitive world, organizations can reach competitive advantage which support entrepreneurship by providing the required tools. One of the most important tools for developing entrepreneurship which was neglected in previous studies is organizational knowledge management. The present paper aims to shed light on the role and importance of knowledge management in sport entreprene...
متن کاملDesigning data analysis services in the Knowledge Grid
Grid environments were originally designed for dealing with problems involving compute-intensive applications. Today, however, grids enlarged their horizon as they are going to manage large amounts of data and run business applications supporting consumers and end users. To face these new challenges, grids must support adaptive data management and data analysis applications by offering resource...
متن کاملAn integrated framework for knowledge - based modeling and simulation of natural systems
This paper proposes a new approach to simulation modeling of natural systems in the context of water quality modeling in streams affected by point source pollution. The approach has a potential for application to other domains of natural resource modeling. Its conceptual basis is knowledge-based simulation and systems analysis. In the approach presented in this paper, a stream or its section is...
متن کامل